Goto

Collaborating Authors

 multiplicative weight



Label-Imbalanced and Group-Sensitive Classification under Overparameterization

Neural Information Processing Systems

The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity. Classical methods, such as weighted cross-entropy, fail when training deep nets to the terminal phase of training (TPT), that is training beyond zero training error. This observation has motivated recent flurry of activity in developing heuristic alternatives following the intuitive mechanism of promoting larger margin for minorities. In contrast to previous heuristics, we follow a principled analysis explaining how different loss adjustments affect margins. First, we prove that for all linear classifiers trained in TPT, it is necessary to introduce multiplicative, rather than additive, logit adjustments so that the interclass margins change appropriately. To show this, we discover a connection of the multiplicative CE modification to the cost-sensitive support-vector machines.


Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

Neural Information Processing Systems

We study the collaborative PAC learning problem recently proposed in Blum et al.~\cite{BHPQ17}, in which we have $k$ players and they want to learn a target function collaboratively, such that the learned function approximates the target function well on all players' distributions simultaneously. The quality of the collaborative learning algorithm is measured by the ratio between the sample complexity of the algorithm and that of the learning algorithm for a single distribution (called the overhead). We obtain a collaborative learning algorithm with overhead $O(\ln k)$, improving the one with overhead $O(\ln^2 k)$ in \cite{BHPQ17}. We also show that an $\Omega(\ln k)$ overhead is inevitable when $k$ is polynomial bounded by the VC dimension of the hypothesis class. Finally, our experimental study has demonstrated the superiority of our algorithm compared with the one in Blum et al.~\cite{BHPQ17} on real-world datasets.


On preserving non-discrimination when combining expert advice

Avrim Blum, Suriya Gunasekar, Thodoris Lykouris, Nati Srebro

Neural Information Processing Systems

Discrimination is commonly an issue in applications where decisions need to be made sequentially. The most prominent such application is online advertising where platforms need to sequentially select which ad to display in response to particular query searches.



Label-Imbalanced and Group-Sensitive Classification under Overparameterization

Neural Information Processing Systems

The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity. Classical methods, such as weighted cross-entropy, fail when training deep nets to the terminal phase of training (TPT), that is training beyond zero training error. This observation has motivated recent flurry of activity in developing heuristic alternatives following the intuitive mechanism of promoting larger margin for minorities. In contrast to previous heuristics, we follow a principled analysis explaining how different loss adjustments affect margins. First, we prove that for all linear classifiers trained in TPT, it is necessary to introduce multiplicative, rather than additive, logit adjustments so that the interclass margins change appropriately. To show this, we discover a connection of the multiplicative CE modification to the cost-sensitive support-vector machines.


Reviews: Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

Neural Information Processing Systems

This paper nearly settles (effectively settles, for a large range of parameters) an obvious question: " what is the overhead (as a function of k, the number of underlying distributions) in the sample complexity of collaborative PAC learning, compared to vanilla PAC learning?" An easy upper bound is a factor of k. Blum et al. established an upper bound of O(log 2 k) on the overhead factor (for k O(d), where d is the VC dimension of the concept class to learn), and a lower bound of Omega(log k) (for the specific case k d). The main contribution of this paper is to provide an O(log k) upper bound (for k O(d), again; the general upper bound is slightly more complicated for general k) on that ratio; they also generalize the lower bound to hold for any k d {O(1)}. The lower bound is aobtained by generalized and "bootstarting" the lower bound construction of Blum et al., with some changes to handle the base case.


A Simple and Practical Algorithm for Differentially Private Data Release

Neural Information Processing Systems

We present a new algorithm for differentially private data release, based on a simple combination of the Multiplicative Weights update rule with the Exponential Mechanism. Our MWEM algorithm achieves what are the best known and nearly optimal theoretical guarantees, while at the same time being simple to implement and experimentally more accurate on actual data sets than existing techniques.


Adaptive Decision-Making with Constraints and Dependent Losses: Performance Guarantees and Applications to Online and Nonlinear Identification

Muehlebach, Michael

arXiv.org Artificial Intelligence

We consider adaptive decision-making problems where an agent optimizes a cumulative performance objective by repeatedly choosing among a finite set of options. Compared to the classical prediction-with-expert-advice set-up, we consider situations where losses are constrained and derive algorithms that exploit the additional structure in optimal and computationally efficient ways. Our algorithm and our analysis is instance dependent, that is, suboptimal choices of the environment are exploited and reflected in our regret bounds. The constraints handle general dependencies between losses (even across time), and are flexible enough to also account for a loss budget, which the environment is not allowed to exceed. The performance of the resulting algorithms is highlighted in two numerical examples, which include a nonlinear and online system identification task.


Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

Chen, Jiecao, Zhang, Qin, Zhou, Yuan

Neural Information Processing Systems

We study the collaborative PAC learning problem recently proposed in Blum et al. \cite{BHPQ17}, in which we have $k$ players and they want to learn a target function collaboratively, such that the learned function approximates the target function well on all players' distributions simultaneously. The quality of the collaborative learning algorithm is measured by the ratio between the sample complexity of the algorithm and that of the learning algorithm for a single distribution (called the overhead). We obtain a collaborative learning algorithm with overhead $O(\ln k)$, improving the one with overhead $O(\ln 2 k)$ in \cite{BHPQ17}. We also show that an $\Omega(\ln k)$ overhead is inevitable when $k$ is polynomial bounded by the VC dimension of the hypothesis class. Finally, our experimental study has demonstrated the superiority of our algorithm compared with the one in Blum et al. \cite{BHPQ17} on real-world datasets.